Personnel
Overall Objectives
Research Program
Application Domains
Highlights of the Year
New Software and Platforms
New Results
Bilateral Contracts and Grants with Industry
Partnerships and Cooperations
Dissemination
Bibliography
XML PDF e-pub
PDF e-Pub


Section: New Results

Visual Perception

Visual Tracking for Motion Capture

Participant : Eric Marchand.

This work is achieved in collaboration with Anatole Lécuyer (Inria Hybrid group) through the co-supervision of Guillaume Cortes Ph.D.

In the context of the development of new optical tracking devices, we propose an approach to greatly increase the tracking workspace of VR applications without adding new sensors [69]. Our approach relies on controlled cameras able to follow the tracked markers all around the VR workspace providing 6 DoF tracking data. We designed the proof-of-concept of such approach based on two consumer-grade cameras and a pan-tilt head. This approach has also been extended for the tracking of a drone in GPS denied environment [42].

We also achieved a short study related to the analysis of the 3D motion of head and hand in CAVE-based applications with the goal to optimize optical tracking sensors placement [43].

Object 3D Tracking based on Depth Information and CAD Model

Participants : Agniva Sengupta, Eric Marchand, Alexandre Krupa.

In the context of the iProcess project (see Section 9.3.3.2), we started this year a new study related to pose estimation and tracking of a rigid object observed by a RGB-D camera. We developed a pose estimation approach based on depth information measurement and the use of a CAD model represented by a 3D tetrahedral mesh. The pose parameters are estimated through an iterative optimization process that minimizes the point-to-plane Euclidean distance between the point cloud observed by the RGB-D camera and the surface of the 3D mesh. Preliminary results obtained with simple objects constituted by a set of orthogonal planes showed good performance of this approach. However, the method failed for the case of complex objects that exhibit important curvature surfaces. In order to address this issue we are currently extending the approach to take into account also the RGB information in the optimization criterion.

General Model-based Tracker

Participants : Souriya Trinh, Fabien Spindler, François Chaumette.

We have generalized the model-based tracker [2] available in ViSP [5] to integrate the depth information provided by a RGB-D sensor using the method described in the previous paragraph. It is now possible to fuse in the same optimization scheme measurements such as points of interest, edges, and depth, which allows to improve the robustness and accuracy of the tracker.

3D Localization for Airplane Landing

Participants : Noël Mériaux, Pierre-Marie Kerzerho, Patrick Rives, Eric Marchand, François Chaumette.

This study was realized in the scope of the ANR VisioLand project (see Section 9.2.2). In a first step, we have considered and adapted our model-based tracker [2] to localize the aircraft with respect to the airport surroundings. Satisfactory results have been obtained from real image sequences provided by Airbus. In a second step, we implemented a direct registration method based on dense vision-based tracking that allows localizing the on-board camera from a set of keyframe images corresponding to the landing trajectory. First experiments with simulated and real images have been carried on with promising results. This approach is particularly interesting at the beginning of the descent when the landing track is far away and not very observable in the image. In that sense, the direct registration method is strongly complementary with the model-based approach studied before.

Extrinsic Calibration of Multiple RGB-D Cameras

Participants : Eduardo Fernandez Moral, Patrick Rives.

In collaboration with Alejandro Perez-Yus from the University of Zaragoza, we developed a novel method to estimate the relative poses between RGB and depth cameras without the requirement of an overlapping field of view, thus providing flexibility to calibrate a variety of sensor configurations. This calibration problem is relevant to robotic applications which can benefit of using several cameras to increase the field of view. In our approach, we extract and match lines of the scene in the RGB and depth cameras, and impose geometric constraints to find the relative poses between the sensors. In [31], an analysis of the observability properties of the problem is presented. We have validated our method in both synthetic and real scenarios with different camera configurations, demonstrating that our approach achieves good accuracy and is very simple to apply, in contrast with previous methods based on trajectory matching using visual odometry or SLAM.

Scene Registration with Large Convergence Domain

Participants : Renato José Martins, Patrick Rives.

Image registration has been a major problem in computer vision over the past decades. It implies searching an image in a database of previously acquired images to find one (or several) that fulfill some degree of similarity, e.g. an image of the same scene from a similar viewpoint. This problem is interesting in mobile robotics for topological mapping, re-localization, loop closure and object identification. Scene registration can be seen as a generalization of the above problem where the representation to match is not necessarily defined by a single image (i.e. the information may come from different images and/or sensors), attempting to exploit all information available to pursue higher performance and flexibility. This problem is ubiquitous in robot localization and navigation. We propose a probabilistic framework to improve the accuracy and efficiency of a previous solution for structure registration based on planar representation [12]. The main idea is to explore the properties given by planar surfaces with co-visibility and their normals from two distinct viewpoints. We estimate, in two decoupled stages, the rotation and then the translation, both based on the normal vectors orientation and on the depth. These two stages are efficiently computed by using low resolution depth images and without any feature extraction/matching. In [53], we also analyze the limitations and observabilty of this approach, and its relationship to ICP point-to-plane. Notably, if the rotation is observable, at least five DoF can be estimated in the worst case. To demonstrate the effectiveness of the method, we evaluate the initialization technique in a set of challenging scenarios, comprising simulated spherical images from the Sponza Atrium model benchmark and real spherical indoor sequences.

Scene Semantization based on Deep Learning Approach

Participants : Eduardo Fernandez Moral, Patrick Rives.

Semantic segmentation of images is an important problem for mobile robotics and autonomous driving because it offers basic information which can be used for complex reasoning and safe navigation. This problem constitutes a very active field of research, where the state-of-the-art evolves continuously with new strategies based on different kinds of deep neural networks for image segmentation and classification. RGB-D images are starting to be employed as well for the same purpose to exploit complimentary information from color and geometry. The team LAGADIC has explored several strategies to increase the performance and the accuracy of semantic segmentation from RGB-D images. We propose a multi-pipeline architecture to exploit effectively the complimentary information from RGB-D images and thus to improve the semantic segmentation results. The multi-pipeline architecture processes the color and depth layers in parallel, before concatenating their feature maps to produce the final semantic prediction. Our results are evaluated on public benchmark datasets to show the improved accuracy of the proposed architecture. [46] Though we address this problem in the context of urban images segmentation, our results can also be extended to other contexts, like indoor scenarios and domestic robotics.

Our research is partly motivated by the need of semantic segmentation solutions with better segmentation around contours. Besides, we note that one of the main issues when comparing different neural networks architectures is how to select an appropriate metric to evaluate their accuracy. We have studied several metrics for multi-class classification, and we propose a new metric which accounts for both global and contour accuracy in a simple formulation to overcome the weaknesses of previous metrics. This metric is based on the Jaccard index, and takes explicitly into account the distance to the border regions of the different classes, to encode jointly the rate of correctly labeled pixels and how homeomorphic is the segmentation to the real object boundaries. We also present a comparative analysis of our proposed metric and several commonly used metrics for semantic segmentation together with a statistical analysis of their correlation.

Online Localization and Mapping for UAVs

Participants : Muhammad Usman, Paolo Robuffo Giordano.

Localization and mapping in unknown environments is still an open problem, in particular for what concerns UAVs because of the typical limited memory and processing power available onboard. In order to provide our quadrotor UAVs with high autonomy, we started studying how to exploit onboard cameras for an accurate (but fast) localization and mapping in unknown indoor environments. We chose to base both processes on the newly available Semi-Direct Visual Odometry (SVO) library (http://rpg.ifi.uzh.ch/software) which has gained considerable attention over the last years in the robotics community. The idea is to exploit dense images (i.e., with little image pre-processing) for obtaining an incremental update of the camera pose which, when integrated over time, can provide the camera localization (pose) w.r.t. the initial frame. In order to reduce drifts during motion, a concurrent mapping thread is also used for comparing the current view with a set of keyframes (taken at regular steps during motion) which constitute a “map” of the environment. We have started porting the SVO library to our UAVs and the preliminary results showed good performance of the localization accuracy against the Vicon ground truth. We are now planning to close the loop and base the UAV flight on the reconstructed pose from the SVO algorithm.

Reflectance and Illumination Estimation for Realistic Augmented Reality

Participants : Salma Jiddi, Eric Marchand.

A key factor for realistic Augmented Reality is a correct illumination simulation. This consists in estimating the characteristics of real light sources and use them to model virtual lighting. This year, we studied a novel method for recovering both 3D position and intensity of multiple light sources using detected cast shadows. Our algorithm has been successfully tested on a set of real scenes where virtual objects have visually coherent shadows [70].

Optimal Active Sensing Control

Participants : Marco Cognetti, Paolo Salaris, Paolo Robuffo Giordano.

This study concerns the problem of active sensing control whose objective is to reduce the estimation uncertainty of an observer as much as possible by determining the inputs of the system that maximize the amount of information gathered by the few noisy outputs while at the same time reduce the negative effects of the process/actuation noise. The latter is far from being negligible for several robotic applications (a prominent example being aerial vehicles).

Last year, we extended a previous work [9] to the case where the observability property is not instantaneously guaranteed, and hence the optimal estimation strategy cannot be given in terms of the instantaneous velocity direction of the robot and consequently of the onboard sensors. These outcomes of this research have been presented in [61] for nonlinear differentially flat systems. This year, we have moved some steps forward in order to improve and generalize the work in [61]: first of all, we have replaced the Observability Gramian (OG) with the Constructibility Gramian (CG). Despite their similar form, they differ from the fact that the OG measures the information collected along the path about the initial state of the nonlinear system while the CG measures the one about the current/final state with which most robotics applications are more concerned. Second, we have overcome the limit of previous work [61] that only deals with the case where the OG and the CG are known in closed-form. We have also applied our method to the unicycle vehicle which is a more complex dynamic system than the one used in [61] and tested our machinery to the cases of self-calibration and environment reconstruction. Moreover, thanks to the arrival of Marco Cognetti in our group as Post-doc, we are currently working on the application of our method to a quadrotor UAV, which is a much more complex dynamic system, for which the CG is not known in closed-form. The ultimate goal is to test our new machinery in a real experiment with a quadrotor UAV. Finally, we have also worked on the problem of considering the process/actuation noise in the optimization algorithm. As the CG (or the OG) does not take into account the degrading effects on the information collected through the outputs of the process/actuation noise, we have proposed to directly maximize the smallest eigenvalue of the covariance matrix given by the Riccati differential equation of the EKF, used as estimation algorithm. The results of this approach have been submitted to ICRA 2018.